Object Detection and Rendering for Wide Field of View (WFOV) Image Acquisition Systems

ABSTRACT

An image acquisition device having a wide field of view includes a lens and image sensor configured to capture an original wide field of view (WFoV) image with a field of view of more than 90°. The device has an object detection engine that includes one or more cascades of object classifiers, e.g., face classifiers. A WFoV correction engine may apply rectilinear and/or cylindrical projections to pixels of the WFoV image, and/or non-linear, rectilinear and/or cylindrical lens elements or lens portions serve to prevent and/or correct distortion within the original WFoV image. One or more objects located within the original and/or distortion-corrected WFoV image is/are detectable by the object detection engine upon application of the one or more cascades of object classifiers.

PRIORITY

This application claims the benefit of priority under 35 USC §119 toU.S. provisional patent application No. 61/311,264, filed Mar. 5, 2010.This application is one of a series of contemporaneously-filed patentapplications including United States patent application (Atty. DocketFN-353A-US, FN-353B-US, and FN-353C-US), each of which are incorporatedby reference.

BACKGROUND

Face detection methods have become very well established within digitalcameras in recent years. This technology brings a range of benefitsincluding enhanced acquisition of the main image and adaptation of theacquisition process to optimized image appearance and quality based onthe detected faces.

More recently, newer consumer cameras have begun to feature wide fieldof view (WFOV) imaging systems and as the benefits of obtaining a widerscene become apparent to consumers, it is expected that further growthwill ensue in such imaging systems along with an ability to achieve evenwider fields of view over time. In professional cameras, such WFOVimaging systems are better known, the most well known being the fish-eyelens. WFOV imaging systems are also used in a range of applicationsincluding Google's “street-view” technology and for some video-phonesystems where they enable a number of people sitting at a table to beimaged by a single sensor and optical system.

Now mapping a WFOV image onto a rectilinear image sensor is non-trivialand a wide range of different techniques are available depending on theexact form of the WFOV lens and associated optical elements. The desiredimage perspective is also important.

Unfortunately due to the complexity of WFOV imaging systems the benefitsof face detection technologies have not been successfully applied tosuch systems. In particular, faces near the center of a WFOV cameraappear closer to the camera and experience some geometrical distortions.Faces about mid-way from the center appear at approximately the correctdistances from the camera and experience less significant distortions.Faces towards the edge experience very significant geometricaldistortions. The exact nature of each of these types of perspective andgeometrical distortion depend on the nature of the lens and opticalsystem.

Clearly a conventional face detection or face tracking system employingrectangular classifiers or integral image techniques cannot beconveniently applied directly to such faces. Accordingly methods aredesired to adapt and compensate for image distortions within such WFOVimaging systems so that face detection technologies can be successfullyemployed in devices like digital cameras and video phone systems.

The following is fromhttp://www.panorama-numerique.com/squeeze/squeeze.htm, where it isreferred to as “Correcting wider than 90° rectilinear images to print orto display architecture panoramas,” by Georges Lagarde. The indicatedpoint is to remove stretching near the sides of a wide angle shot. Mr.Lagarde indicates that one simply has to “just squeeze your panos!”However, in practice, there are greater complexities that than. Thisapplication provides several embodiments after this introduction fordisplaying panoramas without all the inherent distortion.

Mr. Lagarde points out that 180° panoramic images require large screenreal-estate. Reduced to a more usual size, Mr. Lagarde presents theexamples illustrated at FIGS. 1A-1G. While panoramic images aretypically difficult to appraise, displaying them in a narrow window hasgenerally been avoided, and instead a 1280×1024 screen or “larger” and afast Internet connection may be typically recommended.

Mr. Lagarde points out that the exact same source images of FIGS. 1A-1G(showing the Préfecture building in Grenoble, France) were used in aprevious tutorial: Rectilinear/cylindric/equirectangular selection madeeasy, and that different but acceptable panoramic images can result fromstitching the same source images and then using different projectionmodes is implied here and there.

FIG. 1A illustrates Piazza Navona, Roma by Gaspar Van Wittel, 1652-1736(Museo Thyssen-Bornemisza, Madrid).

Mr. Lagarde indicates that most photographers restrict themselves tosubjects which can be photographed with a rectilinear lens (planeprojection). A small number of them sometimes use a fisheye lens(spherical projection) or a rotating lens camera (cylindricalprojection) or a computer (stitcher programs make use of variousprojection modes), but when the field of view (horizontal FOV and/orvertical FOV) is higher than 90 degrees (or about, this actually dependson the subject) they are disturbed by the “excessive wide-angledistortion” found in the resulting images.

Adapting the usual projection modes to the subject and/or using multiplelocal projections to avoid this distortion is a violation of theclassical perspective rules, but escaping classical perspective rules isexactly what sketchers and painters always did to avoid unpleasantimages. Mr. Lagarde points out that this was explained by Anton MariaZanetti and Antonio Conti using the words of their times (“Il Professorem'entendara”) when they described how the camera ottica was used by theseventeenth century Venetian masters. Because the field of view of thelenses available then was much lower than 90°, that a camera oscura wasnot able to display the very wide vedute they sketched and painted isevident: the solution was to record several images and to stitch themonto the canvas to get a single view (strangely enough, that the fieldof view is limited to about 90 degrees when one uses classicalperspective—aka rectilinear projection on a vertical plane—is nothandled in most perspective treatises.)

Equivalent “tricks” can be used for photographic images:

-   -   Use of several projection planes—their number and location        depending of the subject—for a single resulting image. This is        the method explained by L. Zelnik-Manor in Squaring the Circle        in Panoramas (see references.)    -   Use of several projection modes—the selected modes depending of        the subject—for a single resulting image. This is the method        proposed by Buho (Eric S.) and used by Johnh (John Houghton) in        Hybrid Rectilinear & Cylindrical projections (see references.)    -   Use of an “altered rectilinear” projection (thus no more        rectilinear) where the modification is a varying horizontal        compression, null in the center, high near the sides). This is        the method proposed by Olivier_G (Olivier Gallen) in Panoramas:        la perspective classique ne s'applique plus! (see references.)    -   Use of “squeezed rectilinear” projection (neither an actual        rectilinear one) where the modification is a varying horizontal        and vertical compression, null near the horizon (shown as a red        line in the examples), null near a vertical line which goes        through to the main vanishing point (shown as a blue line in the        examples), increasing like tangent (angle) toward the sides        (where angle correspond to the angular distance between the        point and the line.)

If photographers like the results, no doubt they will use that.

Example 1 Cylindrical—180°

In a first example, referring now to FIG. 1B, an image is shown that isa 180° panorama where cylindrical projection mode is used to show a longbuilding viewed from a short distance. Most people dislike images likethis one, where except for the horizon, every straight horizontal lineis heavily curved.

Example 2 Rectilinear—155°

The next image shown in FIG. 1C illustrates an attempt to use therectilinear projection mode: every straight line in the buildings isrendered as a straight line. But, while rectilinear projection workswell when field of view is lower than 90 degrees, it should never beused when field of view is larger than 120 degrees. In this image,though the field of view was restricted to 155 degree (original panoramacorresponds to 180°), the stretching is too high in the left and rightparts and the result utterly unacceptable.

Example 3 Squeezed Rectilinear—155°

Referring to FIG. 1D, because digital images can be squeezed at will,rather than discarding this previous rectilinear image, one can correctthe excessive stretching. The result is no more rectilinear (diagonallines are somewhat distorted) but a much wider part of the buildings nowhave an acceptable look. The variable amount of squeezing I used isshown by the dotted line near the top side: the more close the dots are,the more compressed was the corresponding part of the rectilinearoriginal.

Example 4 Edges, from the 180° Cylindrical Version

Referring to FIG. 1E, the rendering of the main building is much better.Note that this view looks like it were taken from a more distant pointof view than in the cylindrical image: this is not true, the same sourceimages were used for both panoramas.

Example 5 Center, from 155° Squeezed Rectilinear Version

Referring to FIG. 1F, the left most and right most parts of the squeezedimage are improved, but they are still not very pleasant. Here is apossible solution, where I used the edge parts of the cylindricalversion in a second layer:

Example 6 Squeezed Rectilinear (Center)+Cylindrical (Left and RightEdges)−180°

And finally, referring to FIG. 1G: This view can be compared with theexample of FIG. 1B on the top of this page: each one shows exactly thesame buildings and cars, and each comes from exactly the same sourceimages.

The pictured buildings in FIGS. 1B-1G are located on the sides of alarge square but, because there are many large trees on this square,standing back enough for a large field of view is not possible. Theimage shown in FIG. 1B illustrates photos that were actually taken at arather short distance from the main building, while FIG. 1G suggests theviewer being much more distant from this building.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1A-1G illustrate various conventional attempts to avoid distortionin images with greater than 90° field of view.

FIG. 2 schematically illustrates a wide field of view (WFOV) system thatin one embodiment incorporates a face tracker.

FIG. 3( a) illustrates a wide horizontal scene mapped onto a full extentof an image sensor.

FIG. 3( b) illustrates a wide horizontal scene not mapped onto a fullextent of an image sensor, and instead a significant portion of thesensor is not used.

FIG. 4 illustrates the first four Haar classifiers used in facedetection.

FIGS. 4( a)-4(c) illustrate magnification of a person speaking among agroup of persons within a WDOF image.

FIGS. 5( a)-5(c) illustrate varying the magnification of a personspeaking among a group of persons within a WDOF image, wherein thedegree of magnification may vary depending on the strength or loudnessof the speaker's voice.

DETAILED DESCRIPTIONS OF THE EMBODIMENTS

An image acquisition device having a wide field of view is provided. Thedevice includes at least one lens and image sensor configured to capturean original wide field of view (WFoV) image with a field of view of morethan 90°. The device also includes a control module and an objectdetection engine that includes one or more cascades of regular objectclassifiers. A WFoV correction engine of the device is configured tocorrect distortion within the original image. The WFoV correction engineprocesses raw image data of the original WFoV image. A rectilinearprojection of center pixels of the original WFoV image is applied. Acylindrical projection of outer pixels of the original WFoV image isalso applied. Modified center and outer pixels are combined to generatea distortion-corrected WFoV image. One or more objects located withinthe center or outer pixels, or both, of the distortion-corrected WFoVimage are detectable by the object detection engine upon application ofthe one or more cascades of regular object classifiers.

The applying of the rectilinear projection to center pixels may alsoinclude applying a regular rectilinear projection to an inner portion ofthe center pixels and a squeezed rectilinear projection to an outerportion of the center pixels. The applying of the squeezed rectilinearprojection to the outer portion of the center pixels may also includeapplying an increasingly squeezed rectilinear projection in a directionfrom a first boundary with the inner portion of the center pixels to asecond boundary with the outer pixels.

Another image acquisition device having a wide field of view isprovided. The device includes at least one lens and image sensorconfigured to capture an original wide field of view (WFoV) image with afield of view of more than 90°, a control module, and an objectdetection engine that includes one or more cascades of modified objectclassifiers. The modified object classifiers include a first subset ofrectilinear classifiers to be applied to objects appearing in centerpixels of the WFoV image, and a second subset of cylindrical classifiersto be applied to objects appearing in outer pixels of the WFoV image.One or more objects located within the center or outer pixels, or both,of the original WFoV image are detectable by the object detection engineupon application of the one or more cascades of modified objectclassifiers, including the first subset of rectilinear classifiers andthe second subset of cylindrical classifiers, respectively.

The first subset of rectilinear classifiers may include a subset ofregular rectilinear classifiers with which objects appearing in an innerportion of the center pixels are detectable, and a subset of squeezedrectilinear classifiers with which objects appearing in an outer portionof the center pixels are detectable. The subset of squeezed rectilinearclassifiers may include subsets of increasingly squeezed rectilinearclassifiers with which objects appearing in the outer portion of thecenter pixels are increasingly detectable in a direction from a firstboundary with the inner portion of the center pixels to a secondboundary with the outer pixels.

The device may also include a WFoV correction engine configured tocorrect distortion within the original image. The WFoV correction enginemay process raw image data of the original WFoV image. A rectilinearmapping of center pixels of the original WFoV image may be applied. Acylindrical mapping of outer pixels of the original WFoV image may alsobe applied. Modified center and outer pixels may be combined to generatea distortion-corrected WFoV image.

A method is provided for acquiring wide field of view images with animage acquisition device having at least one lens and image sensorconfigured to capture an original wide field of view (WFoV) image with afield of view of more than 90°. The method includes acquiring theoriginal WFoV image. Distortion is corrected within the original WFoVimage by processing raw image data of the original WFoV image. Arectilinear projection is applied to center pixels of the original WFoVimage and a cylindrical projection is applied to outer pixels of theoriginal WFoV image. Modified center and outer pixels are combined togenerate a distortion-corrected WFoV image. One or more cascades ofregular object classifiers are applied to detect one or more objectslocated within the center or outer pixels, or both, of thedistortion-corrected WFoV image upon application of the one or morecascades of regular object classifiers.

The applying a rectilinear projection to center pixels may includeapplying a regular rectilinear projection to an inner portion of thecenter pixels and a squeezed rectilinear projection to an outer portionof the center pixels. The applying of a squeezed rectilinear projectionto the outer portion of the center pixels may include applying anincreasingly squeezed rectilinear projection in a direction from a firstboundary with the inner portion of the center pixels to a secondboundary with the outer pixels.

A further method is provided for acquiring wide field of view imageswith an image acquisition device having at least one lens and imagesensor configured to capture an original wide field of view (WFoV) imagewith a field of view of more than 90°. The method includes acquiring theoriginal WFoV image. One or more cascades of modified object classifiersare applied. A first subset of rectilinear classifiers is applied toobjects appearing in center pixels of the WFoV image, and a secondsubset of cylindrical classifiers is applied to objects appearing inouter pixels of the WFoV image. One or more objects located within thecenter or outer pixels, or both, of the original WFoV image is/aredetected by the applying of the modified object classifiers, includingthe applying of the first subset of rectilinear classifiers and theapplying of the second subset of cylindrical classifiers, respectively.

The applying of the first subset of rectilinear classifiers may includeapplying a subset of regular rectilinear classifiers with which objectsappearing in an inner portion of the center pixels are detectable,and/or applying a subset of squeezed rectilinear classifiers with whichobjects appearing in an outer portion of the center pixels aredetectable. The applying of the subset of squeezed rectilinearclassifiers may include applying subsets of increasingly squeezedrectilinear classifiers with which objects appearing in the outerportion of the center pixels are increasingly detectable in a directionfrom a first boundary with the inner portion of the center pixels to asecond boundary with the outer pixels.

The method may include correcting distortion within the original imageby processing raw image data of the original WFoV image includingapplying a rectilinear mapping of center pixels of the original WFoVimage and a cylindrical mapping of outer pixels of the original WFoVimage, and combining modified center and outer pixels to generate adistortion-corrected WFoV image.

One or more processor-readable media having embedded therein code forprogramming a processor to perform any of the methods described herein.

Another image acquisition device having a wide field of view isprovided. The device includes at least one non-linear lens and imagesensor configured to capture an original wide field of view (WFoV) imagewith a field of view of more than 90°. The non-linear lens is configuredto project a center region of a scene onto the middle portion of theimage sensor such as to directly provide a rectilinear mapping of thecenter region. The device also includes an object detection engineincluding one or more cascades of regular object classifiers. A WFoVcorrection engine of the device is configured to correct distortionwithin the original WFoV image. The WFoV correction engine processes rawimage data of the original WFoV image. A cylindrical projection of outerpixels of the original WFoV image is applied. Center pixels and modifiedouter pixels are combined to generate a distortion-corrected WFoV image.One or more objects located within the center or outer pixels, or both,of the distortion-corrected WFoV image are detectable by the objectdetection engine upon application of the one or more cascades of regularobject classifiers.

Another image acquisition device having a wide field of view isprovided. The device includes at least one non-linear lens and imagesensor configured to capture an original wide field of view (WFoV) imagewith a field of view of more than 90°. The non-linear lens is configuredto project a center region of a scene onto the middle portion of theimage sensor such as to directly provide a rectilinear mapping of thecenter region. An object detection engine includes one or more cascadesof modified object classifiers including a subset of cylindricalclassifiers to be applied to objects appearing in outer pixels of theWFoV image. One or more objects located within the center or outerpixels, or both, of the original WFoV image are detectable by the objectdetection engine upon application of the one or more cascades ofmodified object classifiers, including a subset of regular classifiersand the subset of cylindrical classifiers, respectively.

The device may include a WFoV correction engine configured to correctdistortion within the original image. The WFoV correction engineprocesses raw image data of the original WFoV image. A cylindricalmapping of outer pixels of the original WFoV image is performed. Centerpixels and modified outer pixels are combined to generate adistortion-corrected WFoV image.

Another method is provided for acquiring wide field of view images withan image acquisition device having at least one lens and image sensorconfigured to capture an original wide field of view (WFoV) image with afield of view of more than 90°. The method includes acquiring theoriginal WFoV image, including utilizing at least one non-linear lens toproject a center region of a scene onto a middle portion of the imagesensor such as to directly provide a rectilinear mapping of the centerregion. Distortion is corrected within the original WFoV image byprocessing raw image data of the original WFoV image. A cylindricalprojection of outer pixels of the original WFoV image is applied. Centerpixels and modified outer pixels are combined to generate adistortion-corrected WFoV image. One or more objects are detected byapplying one or more cascades of regular object classifiers to one ormore objects located within the center or outer pixels, or both, of thedistortion-corrected WFoV image.

A further method is provided for acquiring wide field of view imageswith an image acquisition device having at least one lens and imagesensor configured to capture an original wide field of view (WFoV) imagewith a field of view of more than 90°. The method includes acquiring theoriginal WFoV image, including utilizing at least one non-linear lens toproject a center region of a scene onto a middle portion of the imagesensor such as to directly provide a rectilinear mapping of the centerregion. One or more modified object classifiers are applied. A subset ofcylindrical classifiers is applied to objects appearing in outer pixelsof the WFoV image, and a subset of regular classifiers is applied toobjects appearing in center pixels of the WFoV image. One or moreobjects located within center or outer pixels, or both, of the originalWFoV image are detected by the applying of the one or more cascades ofmodified object classifiers, including the applying of the subset ofregular classifiers and the applying of the subset of cylindricalclassifiers, respectively.

The method may include correcting distortion within the original WFoVimage by processing raw image data of the original WFoV image, includingapplying a cylindrical mapping of outer pixels of the original WFoVimage, and combining center pixels and modified outer pixels to generatea distortion-corrected WFoV image.

One or more processor-readable media having embedded therein code is/areprovided for programming a processor to perform any of the methodsdescribed herein of processing wide field of view images acquired withan image acquisition device having an image sensor and at least onenon-linear lens to project a center region of a scene onto a middleportion of the image sensor such as to directly provide a rectilinearmapping of the center region to acquire an original wide field of view(WFoV) image with a field of view of more than 90°.

Another image acquisition device having a wide field of view isprovided. The device includes a lens assembly and image sensorconfigured to capture an original wide field of view (WFoV) image with afield of view of more than 90°. The lens assembly includes a compressedrectilinear lens to capture a center region of a scene onto a middleportion of the image sensor such as to directly provide a rectilinearmapping of the center region. The device also includes a cylindricallens on one or both sides of the compressed rectilinear lens to captureouter regions of the scene onto outer portions of the image sensor suchas to directly provide a cylindrical mapping of the outer regions. Anobject detection engine of the device includes one or more cascades ofregular object classifiers. One or more objects located within thecenter or outer pixels, or both, of the original WFoV image is/aredetectable by the object detection engine upon application of the one ormore cascades of regular object classifiers.

Another image acquisition device having a wide field of view isprovided. The device includes a lens assembly and image sensorconfigured to capture an original wide field of view (WFoV) image with afield of view of more than 90°. The lens assembly includes a lens havinga compressed rectilinear center portion to capture a center region of ascene onto a middle portion of the image sensor such as to directlyprovide a rectilinear mapping of the center region. The lens alsoincludes cylindrical outer portions on either side of the compressedrectilinear portion to capture outer regions of the scene onto outerportions of the image sensor such as to directly provide a cylindricalmapping of the outer regions. An object detection engine of the deviceincludes one or more cascades of regular object classifiers. One or moreobjects located within the center or outer pixels, or both, of theoriginal WFoV image is/are detectable by the object detection engineupon application of the one or more cascades of regular objectclassifiers.

Another image acquisition device having a wide field of view isprovided. The device includes multiple cameras configured to capture anoriginal wide field of view (WFoV) image with a field of view of morethan 90°. The original wide field of view image includes a combinationof multiple images captured each with one of the multiple cameras. Themultiple cameras include a first camera having a first image sensor anda compressed rectilinear lens to capture a center region of a scene ontothe first sensor such as to directly provide a rectilinear mapping ofthe center region, and a second camera having a second image sensor anda first cylindrical lens on a first side of the compressed rectilinearlens to capture a first outer region of the scene onto the second imagesensor such as to directly provide a cylindrical mapping of the firstouter region, and a third camera having a third image sensor and asecond cylindrical lens on a second side of the compressed rectilinearlens to capture a second outer region of the scene onto the third imagesensor such as to directly provide a cylindrical mapping of the secondouter region. An object detection engine of the device includes one ormore cascades of regular object classifiers. One or more objects locatedwithin the original wide field of view image appearing on the multiplecameras of the original WFoV image is/are detectable by the objectdetection engine upon application of the one or more cascades of regularobject classifiers.

Another image acquisition device having a wide field of view isprovided. The device includes multiple cameras configured to capture anoriginal wide field of view (WFoV) image with a field of view of morethan 90°. The original wide field of view image includes a combinationof multiple images captured each with one of the multiple cameras. Themultiple cameras each utilize a same lens and include a first camerahaving a first image sensor utilizing a compressed rectilinear portionof the lens to capture a center region of a scene onto the first sensorsuch as to directly provide a rectilinear mapping of the center region,and a second camera having a second image sensor utilizing a firstcylindrical portion of the lens on a first side of the compressedrectilinear portion to capture a first outer region of the scene ontothe second image sensor such as to directly provide a cylindricalmapping of the first outer region, and a third camera having a thirdimage sensor utilizing a second cylindrical portion of the lens on asecond side of the compressed rectilinear portion to capture a secondouter region of the scene onto the third image sensor such as todirectly provide a cylindrical mapping of the second outer region.

An object detection engine of the device includes one or more cascadesof regular object classifiers. One or more objects located within theoriginal wide field of view image appearing on the multiple cameras ofthe original WFoV image is/are detectable by the object detection engineupon application of the one or more cascades of regular objectclassifiers.

Any of the devices described herein may include a full frame buffercoupled with the image sensor for acquiring raw image data, a mixer, anda zoom and pan engine, and/or an object tracking engine, just as any ofthe methods described herein may include tracking one or more detectedobjects over multiple sequential frames. Any of the object classifiersdescribed herein may include face classifiers or classifiers of otherspecific objects. Any of the regular object classifiers described hereinmay include rectangular object classifiers.

Exemplary face region images distorted in a manner like the buildingfrontages of FIGS. 1B-1G, i.e. distorted by a WFOV, might haverectilinear distortion similar to FIG. 1C at the edges, and as in FIG.1B cylindrical projection.

The system shown in FIG. 2 includes a wide field of view (WFOV) lens offor example 120 degrees; a sensor, for example of 3 megapixels or more;a full frame buffer (e.g., from Bayer); a WFOV correction module; a facedetector and face tracker; a zoom and pan engine; a mixer and a controlmodule. The WFOV system illustrated at FIG. 1 incorporates lens assemblyand corresponding image sensor which is typically more elongated than aconventional image sensor. The system further incorporates a facetracking module which employs one or more cascades of rectangular faceclassifiers.

As the system is configured to image a horizontal field of >90-100degrees or more, it is desired to process the scene captured by thesystem to present an apparently “normal” perspective on the scene. Thereare several approaches to this as exemplified by the example drawn fromthe architectural perspective of a long building described in AppendixA. In the context of our WFOV camera this disclosure is primarilydirected at considering how facial regions will be distorted by the WFOVperspective of this camera. One can consider such facial regions tosuffer similar distortions to the frontage of the building illustratedin this attached Appendix. Thus the problem to obtain geometricallyconsistent face regions across the entire horizontal range of the WFOVcamera is substantially similar to the architectural problem describedtherein.

Thus, in order to obtain reasonable face regions, it is useful toalter/map the raw image obtained from the original WFOV horizontal sceneso that faces appear undistorted. Or in alternative embodiments faceclassifiers may be altered according to the location of the face regionswithin an unprocessed (raw) image of the scene.

In a first preferred embodiment the center region of the imagerepresenting up to 100′ of the horizontal field of view (FOV) is mappedusing a squeezed rectilinear projection. In a first embodiment this maybe obtained using a suitable non-linear lens design to directly projectthe center region of the scene onto the middle ⅔ of the image sensor.The remaining approximately ⅓ portion of the image sensor (i.e. ⅙ ateach end of the sensor) has the horizontal scene projected using acylindrical mapping. Again in a first preferred embodiment the edges ofthe wide-angle lens are designed to optically effect said projectiondirectly onto the imaging sensor.

Thus, in a first embodiment, the entire horizontal scene is mapped ontothe full extent of the image sensor, as illustrated at FIG. 3( a).

Naturally the form and structure of such a complex hybrid optical lensmay not be conducive to mass production thus in an alternativeembodiment a more conventional rectilinear wide-angle lens is used andthe squeezing of the middle ⅔ of the image is achieved bypost-processing the sensor data. Similarly the cylindrical projectionsof the outer regions of the WFOV scene are performed by post processing.In this second embodiment the initial projection of the scene onto thesensor does not cover the full extent of the sensor and thus asignificant portion of the sensor area does not contain useful data. Theoverall resolution of this second embodiment is reduced and a largersensor would be used to achieve similar accuracy to the firstembodiment, as illustrated at FIG. 3( b).

In a third embodiment some of the scene mappings are achieved optically,but some additional image post-processing is used to refine the initialprojections of the image scene onto the sensor. In this embodiment thelens design can be optimized for manufacturing considerations, a largerportion of the sensor area can be used to capture useful scene data andthe software post-processing overhead is similar to the pure softwareembodiment.

In a fourth embodiment multiple cameras are configured to coveroverlapping portions of the desired field of view and the acquiredimages are combined into a single WFOV image in memory. Preferably, thisplurality of cameras are configured to have the same optical center,thus mitigating perspective related problems for foreground objects. Insuch an embodiment techniques employed in panorama imaging may be usedadvantageously to join images at their boundaries, or to determine theoptimal join line where a significant region of image overlap isavailable. The following cases assigned to the same assignee relate topanorama imaging and are incorporated by reference: Ser. Nos.12/636,608, 12/636,618, 12/636,629, 12/636,639, and 12/636,647, as areUS published apps nos. U.S. patent application US20060182437,US20090022422, US20090021576 and US20060268130.

In one preferred embodiment of the multi-camera WFOV device three, ormore standard cameras with a 60 degree FOV are combined to provide anoverall horizontal WFOV of 120-150 degrees with an overlap of 15-30degrees between cameras. The field of view for such a cameras can beextended horizontally by adding more cameras; it may be extendedvertically by adding an identical array of 3 or more horizontallyaligned cameras facing in a higher (or lower) vertical direction andwith a similar vertical overlap of 15-30 degrees offering a vertical FOVof 90-105 degrees for two such WFOV arrays. The vertical FOV may beincreased by adding further horizontally aligned cameras arrays. Suchconfigurations have the advantage that all individual cameras can beconventional wafer-level cameras (WLC) which can be mass-produced.

In an alternative multi-cameras embodiment a central WFOV cameras hasits range extended by two side-cameras. The WFOV cameras can employ anoptical lens optimized to provide a 120 degree compressed rectilinearmapping of the central scene. The side cameras can be optimized toprovide a cylindrical mapping of the peripheral regions of the scene,thus providing a similar result to that obtained in FIG. 3( a), butusing three independent cameras with independent optical systems ratherthan a single sensor/ISP as shown in FIG. 3( b). Again techniquesemployed in panorama imaging to join overlapping images can beadvantageously used (see the Panorama cases referred to above herein).

After image acquisition and, depending on the embodiment, additionalpost-processing of the image, we arrive at a mapping of the image scenewith three main regions. Over the middle third of the image there is anormal rectilinear mapping and the image is undistorted compared to astandard FOV image; over the next ⅓ of the image (i.e. ⅙ of image oneither side) the rectilinear projection becomes increasingly squeezed asillustrated in FIGS. 1A-1G; finally, over the outer approximately ⅓ ofthe image a cylindrical projection, rather than rectilinear is applied.

FIG. 3( a) illustrates one embodiment where this can be achieved using acompressed rectilinear lens in the middle, surrounded by two cylindricallenses on either side. In a practical embodiment all three lenses couldbe combined into a single lens structure designed to minimizedistortions where the rectilinear projection of the original sceneoverlaps with the cylindrical projection.

A standard face-tracker can now be applied to the WFOV image as all faceregions should be rendered in a relatively undistorted geometry.

In alternative embodiments the entire scene need not be re-mapped, butinstead only the luminance components are re-mapped and used to generatea geometrically undistorted integral image. Face classifiers are thenapplied to this integral image in order to detect faces. Once faces aredetected those faces and their surrounding peripheral regions can bere-mapped on each frame, whereas it may be sufficient to re-map theentire scene background, which is assumed to be static, onlyoccasionally, say every 60-120 image frames. In this way imageprocessing and enhancement can be focussed on the people in the imagescene.

In alternative embodiments it may not be desirable to completely re-mapthe entire WFOV scene due to the computational burden involved. In suchembodiment, referring to U.S. Pat. Nos. 7,460,695, 7,403,643, 7,565,030,and 7,315,631 and US published app no. 2009-0263022, which areincorporated by reference along with US20090179998, US20090080713, US2009-0303342 and U.S. Ser. No. 12/572,930, filed Oct. 2, 2009 by thesame assignee. These references describe predicting face regions(determined from the previous several video frames). The images may betransformed using either cylindrical or squeezed rectilinear projectionprior to applying a face tracker to the region. In such an embodiment,it may be involved from time to time to re-map a WFOV in order to makean initial determination of new faces within the WFOV image scene.However, after such initial determination only the region immediatelysurrounding each detected face need be re-mapped.

In certain embodiments, the remapping of the image scene, or portionsthereof, involves the removal of purple fringes (due to blue shift) orthe correction of chromatic aberrations. The following case is assignedto the same assignee is incorporated by reference and relates to purplefringing and chromatic aberration correction: US20090189997.

In other embodiments a single mapping of the input image scene is used.If, for example, only a simple rectilinear mapping were applied acrossthe entire image scene the edges of the image would be distorted as inFIG. 1C and only across the middle 40% or so of the image can aconventional face tracker be used. Accordingly the rectangularclassifiers of the face tracker are modified to take account of thescene mappings across the other 60% of image scene regions: Over themiddle portion of the image they can be applied unaltered; over thesecond 30% they are selectively expanded or compressed in the horizontaldirection to account for the degree of squeezing of the scene during therectilinear mapping process. Finally, in the outer ⅓ the faceclassifiers are adapted to account for the cylindrical mapping used inthis region of the image scene.

In order to transform standard rectangular classifiers of a particularsize, say 32×32 pixels, it may be advantageous in some embodiments toincrease the size of face classifiers to, for example, 64×64. Thislarger size of classifier would enable greater granularity, and thusimproved accuracy in transforming normal classifiers to distorted ones.This comes at the expense of additional computational burden for theface tracker. However we note that face tracking technology is quitebroadly adopted across the industry and is known as a robust and welloptimized technology. Thus the trade off of increasing classifiers from32×32 to 64×64 for such faces should not cause a significant delay onmost camera or smartphone platforms. The advantage is that pre-existingclassifier cascades can be re-used, rather than having to train new,distorted ones.

Having greater granularity for the classifiers is advantageousparticularly when starting to rescale features inside the classifierindividually, based on the distance to the optical center. In anotherembodiment, one can scale the whole 22×22 (this is a very good size forface classifiers) classifier with fixed dx,dy (computed as distance fromthe optical center). Having larger classifiers does not put excessivestrain on the processing. Advantageously, it is opposite to that,because there are fewer scales to cover. In this case, the distance tosubject is reduced.

In an alternative embodiment an initial, shortened chain of modifiedclassifiers is applied to the raw image (i.e. without any rectilinear orcylindrical re-mapping). This chain is composed of some of the initialface classifiers from a normal face detection chain. These initialclassifiers are also, typically, the most aggressive to eliminatenon-faces from consideration. These also tend to be simpler in form andthe first four Haar classifiers from the Viola-Jones cascade areillustrated in FIG. 4 (these may be implemented through a 22×22 pixelwindow in another embodiment).

Where a compressed rectilinear scaling would have been employed (asillustrated in FIG. 1F, it is relatively straightforward to invert thisscaling and expand (or contract) these classifiers in the horizontaldirection to compensate for the distortion of faces in the raw imagescene. (In some embodiments where this distortion is cylindrical towardsthe edges of the scene then classifiers may need to be scaled both inhorizontal and vertical directions). Further, it is possible from aknowledge of the location at which each classifier is to be applied and,optionally, the size of the detection window, to perform the scaling ofthese classifiers dynamically. Thus only the original classifiers haveto be stored together with data on the required rectilinear compressionfactor in the horizontal direction. The latter can easily be achievedusing a look-up table (LUT) which is specific to the lens used.

This short classifier chain is employed to obtain a set of potentialface regions which may then be re-mapped (using, for example, compressedrectilinear compression and/or cylindrical mapping) to enable theremainder of a complete face detection classifier chain to be applied toeach potential face region. This embodiment relies on the fact that99.99% of non-face regions are eliminated by applying the first few faceclassifiers; thus a small number of potential face regions would bere-mapped rather than the entire image scene before applying a full facedetection process.

In another embodiment, distortion may be compensated by a method thatinvolves applying geometrical adjustments (function of distance tooptical center) when an integral image is computed (in the cases wherethe template matching is done using II) or compensate for the distortionwhen computing the sub-sampled image used for face detection and facetracking (in the cases where template matching is done directly on Ydata).

Note that face classifiers can be divided into symmetric andnon-symmetric classifiers. In certain embodiments it may be advantageousto use split classifier chains. For example right and left-hand facedetector cascades may report detection of a half-face region—this mayindicate that a full face is present but the second half is more or lessdistorted than would be expected, perhaps because it is closer to orfarther from the lens than is normal. In such cases a more relaxed half,or full-face detector may be employed to confirm if a full face isactually present or a lower acceptance threshold may be set for thecurrent detector. The following related apps assigned to the sameassignee are incorporated by reference: US2007/0147820, US2010/0053368,US2008/0205712, US2009/0185753, US2008/0219517 and 2010/0054592, andU.S. Ser. No. 61/182,625, filed May 29, 2009 and U.S. Ser. No.61/221,455, filed Jun. 29, 2009.

In certain embodiments, when a face is tracked across the scene it maybe desired to draw particular attention to that face and to emphasize itagainst the main scene. In one exemplary embodiment, suitable forapplications in videotelephony, there may be one or more faces in themain scene but one (or more) of these is speaking. It is possible, usinga stereo microphone to localize the speaking face.

This face regions, and the other foreground regions (e.g. neck,shoulders & torso) are further processed to magnify them (e.g., in oneembodiment by a factor of x1.8 times) against the background; in asimple embodiment this magnified face is simply composited onto thebackground image in the same location as the unmagnified original

In a more sophisticated embodiment the other faces and the mainbackground of the image are de-magnified and/or squeezed in order tokeep the overall image size self-consistent. This may lead to some imagedistortion, particularly surrounding the “magnified” face, but thishelps to emphasize the person speaking as illustrated in FIGS. 4(a)-4(c). In this case the degree of magnification is generally <x1.5 toavoid excessive distortion across the remainder of the image.

In another embodiment, one can do a background+face mix or combinationusing an alpha map without worrying about distortions. Then, the facethat speaks can be placed at the middle of the frame. In an anothervariation on this embodiment, the degree of magnification can be variedaccording to the strength or loudness of a speaker's voice, asillustrated at FIGS. 5( a)-5(c).

In other embodiments based on the same scene re-mapping techniques, therendering of the face region and surrounding portions of the image canbe adjusted to emphasize one or more persons appearing in the final,re-mapped image of the captured scene. In one embodiment within avideophone system, a stereo microphone system triangulates the locationof the person speaking and a portion of the scene is zoomed by a factorgreater than one. The remaining portions of the image are zoomed by afactor less than one, so that the overall image is of approximately thesame dimension. Thus persons appearing in the image appear larger whenthey are talking and it is easier for viewers to focus on the currentspeaker from a group.

The present invention is not limited to the embodiments described aboveherein, which may be amended or modified without departing from thescope of the present invention.

In methods that may be performed according to preferred embodimentsherein and that may have been described above, the operations have beendescribed in selected typographical sequences. However, the sequenceshave been selected and so ordered for typographical convenience and arenot intended to imply any particular order for performing theoperations.

In addition, all references cited above herein, in addition to thebackground and summary of the invention sections, are herebyincorporated by reference into the detailed description of the preferredembodiments as disclosing alternative embodiments and components.Moreover, as extended depth of field (EDOF) technology may be combinedwith embodiments described herein into advantageous alternativeembodiments, the following are incorporated by reference: US publishedpatent applications numbers 20060256226, 20060519527, 20070239417,20070236573, 20070236574, 20090128666, 20080095466, 20080316317,20090147111, 20020145671, 20080075515, 20080021989, 20050107741,20080028183, 20070045991. 20080008041, 20080009562, 20080038325,20080045728, 20090531723, 20090190238, 20090141163, and 20080002185.

1. An image acquisition device having a wide field of view, comprising:at least one non-linear lens and image sensor configured to capture anoriginal wide field of view (WFoV) image with a field of view of morethan 90°, wherein the non-linear lens is configured to project a centerregion of a scene onto the middle portion of the image sensor such as todirectly provide a rectilinear mapping of the center region; a controlmodule; an object detection engine comprising one or more cascades ofregular object classifiers; a WFoV correction engine configured tocorrect distortion within the original WFoV image; wherein the WFoVcorrection engine processes raw image data of the original WFoV imageincluding applying a cylindrical projection of outer pixels of theoriginal WFoV image, and combining center pixels and modified outerpixels to generate a distortion-corrected WFoV image; and wherein one ormore objects located within the center or outer pixels, or both, of thedistortion-corrected WFoV image are detectable by the object detectionengine upon application of the one or more cascades of regular objectclassifiers.
 2. The device of claim 1, further comprising: a full framebuffer coupled with the image sensor for acquiring raw image data; amixer; and a zoom and pan engine.
 3. The device of claim 1, furthercomprising an object tracking engine.
 4. The device of claim 1, whereinthe object classifiers comprise face classifiers.
 5. The device of claim1, wherein the regular object classifiers comprise rectangular objectclassifiers.
 6. An image acquisition device having a wide field of view,comprising: at least one non-linear lens and image sensor configured tocapture an original wide field of view (WFoV) image with a field of viewof more than 90°, wherein the non-linear lens is configured to project acenter region of a scene onto the middle portion of the image sensorsuch as to directly provide a rectilinear mapping of the center region;a control module; an object detection engine comprising one or morecascades of modified object classifiers; wherein the modified objectclassifiers comprise a subset of cylindrical classifiers to be appliedto objects appearing in outer pixels of the WFoV image; and wherein oneor more objects located within the center or outer pixels, or both, ofthe original WFoV image are detectable by the object detection engineupon application of the one or more cascades of modified objectclassifiers, including a subset of regular classifiers and the subset ofcylindrical classifiers, respectively.
 7. The device of claim 6, furthercomprising: a full frame buffer coupled with the image sensor foracquiring raw image data; a mixer; and a zoom and pan engine.
 8. Thedevice of claim 6, further comprising an object tracking engine
 9. Thedevice of claim 6, further comprising a WFoV correction engineconfigured to correct distortion within the original image; and whereinthe WFoV correction engine processes raw image data of the original WFoVimage including applying a cylindrical mapping of outer pixels of theoriginal WFoV image, and combining center pixels and modified outerpixels to generate a distortion-corrected WFoV image.
 10. The device ofclaim 6, wherein the object classifiers comprise face classifiers. 11.The device of claim 6, wherein the regular object classifiers compriserectangular object classifiers.
 12. A method of acquiring wide field ofview images with an image acquisition device having at least one lensand image sensor configured to capture an original wide field of view(WFoV) image with a field of view of more than 90°, wherein the methodcomprises: acquiring the original WFoV image, including utilizing atleast one non-linear lens to project a center region of a scene onto amiddle portion of the image sensor such as to directly provide arectilinear mapping of the center region; correcting distortion withinthe original WFoV image by processing raw image data of the originalWFoV image including applying a cylindrical projection of outer pixelsof the original WFoV image, and combining center pixels and modifiedouter pixels to generate a distortion-corrected WFoV image; anddetecting one or more objects by applying one or more cascades ofregular object classifiers to one or more objects located within thecenter or outer pixels, or both, of the distortion-corrected WFoV image.13. The method of claim 12, further comprising tracking one or moredetected objects over multiple sequential frames.
 14. The method ofclaim 12, wherein the object classifiers comprise face classifiers. 15.The method of claim 12, wherein the regular object classifiers compriserectangular object classifiers.
 16. A method of acquiring wide field ofview images with an image acquisition device having at least one lensand image sensor configured to capture an original wide field of view(WFoV) image with a field of view of more than 90°, wherein the methodcomprises: acquiring the original WFoV image, including utilizing atleast one non-linear lens to project a center region of a scene onto amiddle portion of the image sensor such as to directly provide arectilinear mapping of the center region; applying one or more modifiedobject classifiers, comprising applying a subset of cylindricalclassifiers to objects appearing in outer pixels of the WFoV image; andapplying a subset of regular classifiers to objects appearing in centerpixels of the WFoV image; detecting one or more objects located withincenter or outer pixels, or both, of the original WFoV image by theapplying of the one or more cascades of modified object classifiers,including the applying of the subset of regular classifiers and theapplying of the subset of cylindrical classifiers, respectively.
 17. Themethod of claim 16, further comprising tracking one or more detectedobjects over multiple sequential frames.
 18. The method of claim 16,further comprising correcting distortion within the original WFoV imageby processing raw image data of the original WFoV image includingapplying a cylindrical mapping of outer pixels of the original WFoVimage, and combining center pixels and modified outer pixels to generatea distortion-corrected WFoV image.
 19. The method of claim 16, whereinthe object classifiers comprise face classifiers.
 20. The method ofclaim 16, wherein the regular object classifiers comprise rectangularobject classifiers.
 21. One or more processor-readable media havingembedded therein code for programming a processor to perform a method ofprocessing wide field of view images acquired with an image acquisitiondevice having an image sensor and at least one non-linear lens toproject a center region of a scene onto a middle portion of the imagesensor such as to directly provide a rectilinear mapping of the centerregion to acquire an original wide field of view (WFoV) image with afield of view of more than 90°, wherein the method comprises: correctingdistortion within the original WFoV image by processing raw image dataof the original WFoV image including applying a cylindrical projectionof outer pixels of the original WFoV image, and combining center pixelsand modified outer pixels to generate a distortion-corrected WFoV image;and detecting one or more objects by applying one or more cascades ofregular object classifiers to one or more objects located within thecenter or outer pixels, or both, of the distortion-corrected WFoV image.22. The one or more processor-readable media of claim 21, wherein themethod further comprises tracking one or more detected objects overmultiple sequential frames.
 23. The one or more processor-readable mediaof claim 21, wherein the object classifiers comprise face classifiers.24. The one or more processor-readable media of claim 21, wherein theregular object classifiers comprise rectangular object classifiers. 25.One or more processor-readable media having embedded therein code forprogramming a processor to perform a method of processing wide field ofview images acquired with an image acquisition device having an imagesensor and at least one non-linear lens to project a center region of ascene onto a middle portion of the image sensor such as to directlyprovide a rectilinear mapping of the center region to acquire anoriginal wide field of view (WFoV) image with a field of view of morethan 90°, wherein the method comprises: applying one or more modifiedobject classifiers, comprising applying a subset of cylindricalclassifiers to objects appearing in outer pixels of the WFoV image; andapplying a subset of regular classifiers to objects appearing in centerpixels of the WFoV image; detecting one or more objects located withincenter or outer pixels, or both, of the original WFoV image by theapplying of the one or more cascades of modified object classifiers,including the applying of the subset of regular classifiers and theapplying of the subset of cylindrical classifiers, respectively.
 26. Theone or more processor-readable media of claim 25, wherein the methodfurther comprises tracking one or more detected objects over multiplesequential frames.
 27. The one or more processor-readable media of claim25, wherein the method further comprises correcting distortion withinthe original WFoV image by processing raw image data of the originalWFoV image including applying a cylindrical mapping of outer pixels ofthe original WFoV image, and combining center pixels and modified outerpixels to generate a distortion-corrected WFoV image.
 28. The one ormore processor-readable media of claim 25, wherein the objectclassifiers comprise face classifiers.
 29. The one or moreprocessor-readable media of claim 25, wherein the regular objectclassifiers comprise rectangular object classifiers.